NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SSA: a novel method for Single-cell and Spatial transcriptomics Alignment

https://doi.org/10.29007/9cr1

Tran, Bang; Tran, Dao; Nguyen, Tin (July 2024, EPiC Series in Computing)

Single-cell RNA sequencing (scRNA-seq) provides expression profiles of individual cells but fails to preserve crucial spatial information. On the other hand, Spatial Transcrip- tomics technologies are able to analyze specific regions within tissue sections, but lack of the capability to examine in single-cell resolution. To overcome these issues, we present Single-cell and Spatial transcriptomics Alignment (SSA), a novel technique that employs an optimal transport algorithm to assign individual cells from a scRNA-seq atlas to their spa- tial locations in actual tissue based on their expression profiles. SSA has demonstrated su- perior performance compared to existing methods SpaOTsc, Tangram, Seurat and DistMap using 10 semi-simulated datasets generated from a high-resolution spatial transcriptomics human breast cancer dataset with 100,064 cells. This advancement provides a refined tool for researchers to delve deeper in understanding of the relationship between cellular spatial organization and gene expression.
more » « less
Full Text Available
CCPA: cloud-based, self-learning modules for consensus pathway analysis using GO, KEGG and Reactome

https://doi.org/10.1093/bib/bbae222

Nguyen, Ha; Pham, Van-Dung; Nguyen, Hung; Tran, Bang; Petereit, Juli; Nguyen, Tin (July 2024, Briefings in Bioinformatics)

Abstract This manuscript describes the development of a resource module that is part of a learning platform named ‘NIGMS Sandbox for Cloud-based Learning’ (https://github.com/NIGMS/NIGMS-Sandbox). The module delivers learning materials on Cloud-based Consensus Pathway Analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Pathway analysis is important because it allows us to gain insights into biological mechanisms underlying conditions. But the availability of many pathway analysis methods, the requirement of coding skills, and the focus of current tools on only a few species all make it very difficult for biomedical researchers to self-learn and perform pathway analysis efficiently. Furthermore, there is a lack of tools that allow researchers to compare analysis results obtained from different experiments and different analysis methods to find consensus results. To address these challenges, we have designed a cloud-based, self-learning module that provides consensus results among established, state-of-the-art pathway analysis techniques to provide students and researchers with necessary training and example materials. The training module consists of five Jupyter Notebooks that provide complete tutorials for the following tasks: (i) process expression data, (ii) perform differential analysis, visualize and compare the results obtained from four differential analysis methods (limma, t-test, edgeR, DESeq2), (iii) process three pathway databases (GO, KEGG and Reactome), (iv) perform pathway analysis using eight methods (ORA, CAMERA, KS test, Wilcoxon test, FGSEA, GSA, SAFE and PADOG) and (v) combine results of multiple analyses. We also provide examples, source code, explanations and instructional videos for trainees to complete each Jupyter Notebook. The module supports the analysis for many model (e.g. human, mouse, fruit fly, zebra fish) and non-model species. The module is publicly available at https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
more » « less
RCPA: An Open‐Source R Package for Data Processing, Differential Analysis, Consensus Pathway Analysis, and Visualization

https://doi.org/10.1002/cpz1.1036

Nguyen, Hung; Nguyen, Ha; Maghsoudi, Zeynab; Tran, Bang; Draghici, Sorin; Nguyen, Tin (May 2024, Current Protocols)

Abstract Identifying impacted pathways is important because it provides insights into the biology underlying conditions beyond the detection of differentially expressed genes. Because of the importance of such analysis, more than 100 pathway analysis methods have been developed thus far. Despite the availability of many methods, it is challenging for biomedical researchers to learn and properly perform pathway analysis. First, the sheer number of methods makes it challenging to learn and choose the correct method for a given experiment. Second, computational methods require users to be savvy with coding syntax, and comfortable with command‐line environments, areas that are unfamiliar to most life scientists. Third, as learning tools and computational methods are typically implemented only for a few species (i.e., human and some model organisms), it is difficult to perform pathway analysis on other species that are not included in many of the current pathway analysis tools. Finally, existing pathway tools do not allow researchers to combine, compare, and contrast the results of different methods and experiments for both hypothesis testing and analysis purposes. To address these challenges, we developed an open‐source R package for Consensus Pathway Analysis (RCPA) that allows researchers to conveniently: (1) download and process data from NCBI GEO; (2) perform differential analysis using established techniques developed for both microarray and sequencing data; (3) perform both gene set enrichment, as well as topology‐based pathway analysis using different methods that seek to answer different research hypotheses; (4) combine methods and datasets to find consensus results; and (5) visualize analysis results and explore significantly impacted pathways across multiple analyses. This protocol provides many example code snippets with detailed explanations and supports the analysis of more than 1000 species, two pathway databases, three differential analysis techniques, eight pathway analysis tools, six meta‐analysis methods, and two consensus analysis techniques. The package is freely available on the CRAN repository. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Processing Affymetrix microarrays Basic Protocol 2: Processing Agilent microarrays Support Protocol: Processing RNA sequencing (RNA‐Seq) data Basic Protocol 3: Differential analysis of microarray data (Affymetrix and Agilent) Basic Protocol 4: Differential analysis of RNA‐Seq data Basic Protocol 5: Gene set enrichment analysis Basic Protocol 6: Topology‐based (TB) pathway analysis Basic Protocol 7: Data integration and visualization
more » « less
Full Text Available
A novel method for single-cell data imputation using subspace regression

https://doi.org/10.1038/s41598-022-06500-4

Tran, Duc; Tran, Bang; Nguyen, Hung; Nguyen, Tin (December 2022, Scientific Reports)

Abstract Recent advances in biochemistry and single-cell RNA sequencing (scRNA-seq) have allowed us to monitor the biological systems at the single-cell resolution. However, the low capture of mRNA material within individual cells often leads to inaccurate quantification of genetic material. Consequently, a significant amount of expression values are reported as missing, which are often referred to as dropouts. To overcome this challenge, we develop a novel imputation method, named single-cell Imputation via Subspace Regression (scISR), that can reliably recover the dropout values of scRNA-seq data. The scISR method first uses a hypothesis-testing technique to identify zero-valued entries that are most likely affected by dropout events and then estimates the dropout values using a subspace regression model. Our comprehensive evaluation using 25 publicly available scRNA-seq datasets and various simulation scenarios against five state-of-the-art methods demonstrates that scISR is better than other imputation methods in recovering scRNA-seq expression profiles via imputation. scISR consistently improves the quality of cluster analysis regardless of dropout rates, normalization techniques, and quantification schemes. The source code of scISR can be found on GitHub at https://github.com/duct317/scISR .
more » « less
Full Text Available
scIDS: Single-cell Imputation by combining Deep autoencoder neural networks and Subspace regression

https://doi.org/10.1109/KSE53942.2021.9648664

Tran, Bang; Nguyen, Quyen; Shrestha, Sangam; Nguyen, Tin (November 2021, 13th International Conference on Knowledge and Systems Engineering (KSE))

Full Text Available
SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis

https://doi.org/10.3389/fonc.2021.725133

Nguyen, Hung; Tran, Duc; Tran, Bang; Roy, Monikrishna; Cassell, Adam; Dascalu, Sergiu; Draghici, Sorin; Nguyen, Tin (October 2021, Frontiers in Oncology)

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com . The R package will be deposited to CRAN as part of our PINSPlus software suite.
more » « less
Full Text Available
A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data

https://doi.org/10.1093/bib/bbaa190

Nguyen, Hung; Tran, Duc; Tran, Bang; Pehlivan, Bahadir; Nguyen, Tin (May 2021, Briefings in Bioinformatics)
null (Ed.)
Abstract Gene regulatory network is a complicated set of interactions between genetic materials, which dictates how cells develop in living organisms and react to their surrounding environment. Robust comprehension of these interactions would help explain how cells function as well as predict their reactions to external factors. This knowledge can benefit both developmental biology and clinical research such as drug development or epidemiology research. Recently, the rapid advance of single-cell sequencing technologies, which pushed the limit of transcriptomic profiling to the individual cell level, opens up an entirely new area for regulatory network research. To exploit this new abundant source of data and take advantage of data in single-cell resolution, a number of computational methods have been proposed to uncover the interactions hidden by the averaging process in standard bulk sequencing. In this article, we review 15 such network inference methods developed for single-cell data. We discuss their underlying assumptions, inference techniques, usability, and pros and cons. In an extensive analysis using simulation, we also assess the methods’ performance, sensitivity to dropout and time complexity. The main objective of this survey is to assist not only life scientists in selecting suitable methods for their data and analysis purposes but also computational scientists in developing new methods by highlighting outstanding challenges in the field that remain to be addressed in the future development.
more » « less
Full Text Available
Fast and precise single-cell data analysis using a hierarchical autoencoder

https://doi.org/10.1038/s41467-021-21312-2

Tran, Duc; Nguyen, Hung; Tran, Bang; La Vecchia, Carlo; Luu, Hung N.; Nguyen, Tin (February 2021, Nature Communications)

Abstract A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.
more » « less

Search for: All records